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Abstract 

Important thermodynamic parameters induding denaturant equilibrium m 
values (/77 eg ) and heat capacity changes (ACp) can be predicted based on 
changes in Solvent Accessible Surface Area (SASA) upon unfolding. Crosslinks 
such as disulfide bonds influence the stability of the proteins by decreasing the 
entropy gain as well as reduction of SASA of unfolded state. The aim of the 
study was to develop mathematical models to predict the effect of crosslinks 
on AS AS A and ultimately on m eq and ACp based on in silico methods. 
Changes of SASA upon computationally simulated unfolding were calculated 
for a set of 45 proteins with known m eq and ACp values and the effect of 
crosslinks on ASASA of unfolding was investigated. The results were used to 
predict the m eq of denaturation for guanidine hydrochloride and urea, as well 
as ACp for the studied proteins with overall error of 20%, 31% and 17%, re- 
spectively. The results of the current study were in close agreement with those 
obtained from the previous studies. 
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Introduction 

Through the human genome project we 
now know that a human cell can synthesize 
about 20,000 to 25,000 different proteins 
Proteins are an important class of biological 
macromolecules present in all biological or- 
ganisms, and constitute high proportion of the 
dry mass of all cells (2) . Most of the biological 
processes in all cells are executed by proteins. 
The amino acid sequence of a protein contains 
all information needed for adopting its three- 
dimensional structure. However, misfolding 
does occur, even though help from other mo- 
lecules, such as chaperons, for correct and fast 
in vivo folding are in place (3 5) . 

Denaturation studies are very useful for in- 
vestigating the thermodynamic properties of 
proteins. Transition from native to denatured 



states can be brought about by changing the 
properties of protein's environment. In gen- 
eral, this can be done by increasing the tem- 
perature, adding chemical denaturants or 
changing the pH. 

Urea and guanidinium ion (used in the form 
of guanidinium chloride-GdnHCI) favor the 
denatured state by increasing the solubility of 
the unfolded chain in an aqueous solution. In 
comparison to temperature denaturation, che- 
mical denaturation is often a reversible pro- 
cess. This is possible since the hydrophobic 
groups of the unfolded chain are shielded by 
the denaturants, which prevent aggregation. 

The unfolding free energy (AGu) depends 
linearly on the denaturant concentration as: 
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AGjj = AGjj - mjj [Denaturant j (Eq.l) 

Where is AG^ 2 ° the free energy of un- 
folding in the absence of denaturant and mu 
denotes the dependency of free energy on de- 
naturant concentration (i.e. m eq ) (6) . A good 
linearity is observed at high denaturant con- 
centrations and AG^ 2 ° is obtained by extra- 
polation to the zero concentration of denatur- 
ant. AG^ 2 ° values calculated from guanidin- 
ium chloride and urea denaturation are in very 
good agreement (7) which gives this relation 
some further credibility. 

One of the major challenges in the field of 
protein science is to predict the stability and 
function of proteins from their primary struc- 
tures. To accomplish this task, efficient algo- 
rithms are needed to relate the structure to sta- 
bility. The availability of about 77,000 protein 
structures in Protein Data Bank (PDB) (8) and 
a great deal of experimental works on the 
thermodynamic stability of proteins have 
provided a wealth of information which can 
be used for the development of empirical 
functions that relate thermodynamic and 
structural parameters. 

The success of such approach in developing 
structure-based methods to predict various 
thermodynamic parameters that define the 
Gibbs energy, i.e., the enthalpy, entropy and 
heat capacity changes, has been shown previ- 
ously (913) . In the process of unfolding, the 
major contribution to the enthalpy change 
arises from the disruption of intramolecular 
interactions such as van der Waals and hydro- 
gen bonds and also solvation of the interact- 
ing groups. Therefore, the change in solvent 
accessible surface area (ASASA) upon unfold- 
ing has been used as a mean for predicting the 
AH as presented below: 

AH = Z a i (p) • ASASA i (Eq.2) 
i 

Where ASASAi is the change in SASA of 
atom / upon unfolding, and a i (p) is a coef- 



ficient that depends on the atom type and the 
average packing density of that atom within 
the protein (14) . 

The heat capacity change (AC P ) in protein 
unfolding largely arises from changes in the 
hydration of groups that are buried in the 
native form away from the surrounding aque- 
ous environment. ACp is correlated to the 
changes in SASA upon unfolding, as shown in 
the following equation: 

AC p =Ta r ASASAi (Eq.3) 

Where a z is the contribution of atom / per 
unit area and ASASAi is as defined above. 
Using both equations, good correlations were 
obtained between experimental and calculated 
AH and ACp values (14) . 

The aim of current study is to develop em- 
pirical models to account for the effect of 
crosslinks on ASASA and hence on thermody- 
namic parameters (i.e., m eq and ACp) of 
protein unfolding based on computational 
approach. 

Materials and Methods 

Databases and programs 

The experimental m eq values for urea and 
GdnHCl denaturation, as well as ACp dena- 
turations for a set of 45 proteins used in this 
study were from Myers et al (10) . The three-di- 
mensional (3D) structures of the studied pro- 
teins were obtained from Protein Data Bank 
(http://www.rcsb.org/) at RCSB (8) . 

The SASA of the proteins in folded and un- 
folded forms were calculated using DSSP 
program implemented in GROMACS pack- 
age. The DSSP program was designed by 
Wolfgang Kabsch and Chris Sander to stand- 
ardize secondary structure assignment based 
on a database of secondary structure for pro- 
tein entries in the PDB (15) . 

Swiss-Pdb Viewer (SPDBV, version 3.7, 
Swiss Institute of Bioinformatics), an inter- 
active molecular graphics program was used 
for viewing and analyzing protein structures 
(16) . HyperChem (version 7.1; 2002; Hyper- 
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cube Inc.) is the other molecular modeling 
software used in this study. 

GROMACS (version 3.3, University of 
Groningen, The Netherlands, currently main- 
tained by ScalaLife), an engine to perform 
molecular dynamics simulations and energy 
minimization (17) was used under Linux oper- 
ating system (Fedora core 5) on a cluster con- 
sisting of 8 nodes each with two dual-core 
Opteron 2212 CPUs and 2 GB RAM. 

Unfolding the proteins 

The unfolded states of the proteins were 
achieved by three different approaches; (/) 
building the fully extended conformation of 
the protein, (it) instantaneously assigning 
standard bond lengths, bond angles, torsion 
angles, and stereochemistry properties to the 
model structure using a given force field 
method, or (Hi) molecular dynamics simu- 
lation. 

Fully extended conformation 

SPDBV was used to upload the sequence of 
the protein saved in FASTA format. Then the 
sequence was folded into an extended confor- 
mation by setting phi (cp) and psi (\|/) angles to 
those corresponding with P-pleated strand. It 
is clear that in such a conformation there is no 
crosslink in the generated model even if the 
native form of protein consists such con- 
straints. 

Instantaneous unfolding using standard bond 
and angle assignment 

HyperChem program was used to open the 
crystal structure (native form) of protein. In 
this way, the disulfide bonds are lost. If the 
re-establishment of crosslinks was desired, 
first the residues involved in the crosslink 
were selected and then the necessary bonds 
were created between the sulfur atoms in- 
volved in the disulfide bonds. Subsequently, 
the structure was forced to unfold into a ran- 
dom coil losing its regular structures while 
preserving the crosslinks. The unfolded struc- 
tural model was energy minimized using the 
molecular mechanics force field. 

The minimization protocol employed the 
steepest descent method using BIO+, the 



HyperChem implementation of CHARMM 
(Chemistry at HARvard using Molecular 
Mechanics) force field (18) , until the difference 
in energy after two consecutive iterations was 
less than 0.1 kcal/mol. The model structures 
were stored as unfolded states and their SASA 
were calculated as described above. In the 
case of heme containing proteins, two bonds 
were built linking the chelating atoms to the 
central iron atom. This effectively constrains 
the spatial distance between two residues to 
which the iron atom of the heme group is 
linked through coordination of the unpaired 
electrons of nitrogen or sulfur atoms. 

Unfolding using molecular dynamics simulation 
In order to unfold proteins using Molecular 
Dynamics (MD) simulation technique, the 
following steps were performed. First, the 
native structure was downloaded from PDB at 
RCSB and converted into standard Gromacs 
file format. The positions of all hydrogen 
atoms were reconstructed. Subsequently, the 
protein structure was energy minimized in 
vacuum using steepest descent algorithm until 
the maximum force was smaller than 1.0 kJ 
mol^nm 1 . 

GROMOS-96, the officially distributed 
force field for Gromacs, was used for molecu- 
lar mechanics simulations as implemented in 
the software package (19) . Then a simulation 
box was created and protein was centred into 
it. The simulation box was filled in by Simple 
Point Charge (spc216) water and urea mo- 
lecules. The final concentration of the urea in 
the box was about 4.4 M. Before running the 
MD simulation, the system was neutralized by 
adding appropriate number of either Na + or 
Cl~ counter ions to have zero net charge. 
Ultimately, the solvated protein was subjected 
to MD simulation for 10 ns at 500 X and the 
trajectories were saved every 0.02 ns. 

SASA nc - SASA 
Cross linking number = (Eq.4) 

n 

1 N 

Crosslinking factor = — I crosslinking number (Eq.5) 
N i 
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Crosslinking factor (CLF) 

In order to investigate the effect of cross- 
linking on the unfolding behaviour of a pro- 
tein, an index named Crosslinking Factor 
(CLF) was defined as follows: 

Where SASA refers to solvent accessible 
surface area of unfolded conformation and the 
subscripts c and nc denote whether the cross- 
links are preserved or not in the unfolded con- 
formation, respectively. The value of n equals 
the number of crosslinks present in any of 
those proteins studied here which have cross- 
links in the native form. N is the number of 
proteins with crosslinks, and / denotes any of 
the studied proteins used to derive CLF value. 

Statistical treatment 

Validation of models: Statistical analyses 
were performed by SPSS (SPSS for windows 
version 11.5, IBM) and Excel (Microsoft 
Office 2007) programs. Predictive power of 
the mathematical models were evaluated by 
excluding one of the data points, i.e. one of 
the proteins from the data set of 45 proteins 
listed in table 1, and training the model based 
on the remaining proteins and subsequently 
predicting the value of thermodynamic para- 
meter for the excluded protein. This was con- 
tinued until all proteins were used for the pre- 
diction. 

The Standard Deviation of Error of Pre- 
diction (SDEP) was calculated to give a 
measure for the distribution of the errors in- 
volved in the predictions using the following 
equation: 

(77 \ 7 

\ A exp A calc ) 
SDEP = |— (Eq.6) 

Here A exp and A ca i c are predicted values, 
respectively. N denotes the number of data 
points. 

Mean absolute percentage error (MAPE) 

To evaluate the accuracy of predictions, ab- 
solute percentage errors were calculated based 
on the following equations: 



A C alc ~ ^exp 
APE = x 100 (Eq.7) 

A 

exp 

Where A ca \ c and A exp are the calculated and 
experimental values for a given parameter of 
interest, such as AC P , m eq for GdnHCl or urea. 
The average of APE over all data points for 
each of the above mentioned parameters was 
calculated and called MAPE. 

1 N 

MAPE = — S APE i (Eq.8) 
N i 

Where N is the number of data points. 

Results and Discussion 

The changes in solvent accessible surface 
area (ASASA) upon unfolding, as determined 
by the differences in solvent accessibilities of 
native form (calculated from the crystal struc- 
ture) and denatured form (modeled by an ex- 
tended polypeptide chain) are given for a set 
of 45 proteins in table 1 . The table also shows 
m eq values from denaturation experiments, 
ACp of unfolding, number of residues as well 
as crosslinks present in each of these proteins 
taken from the compilation made by Myers et 
al (10) . 

Figures 1A and IB demonstrate depend- 
encies that exist between the denaturants m eq 
values and the changes in the solvent acces- 
sible surface area upon unfolding. There are 
significant linear correlations in both cases, 
with the correlation coefficient (R) values of 
0.85 and 0.87 for GdnHCl and urea, respect- 
ively. The slopes of the linear regression lines 
are 0.25 and 0.17 call (mol.M.A 2 ) for GdnHCl 
and urea, respectively, indicating the stronger 
denaturing effects of GdnHCl. 

Denaturation heat capacity changes (ACp) 
were also correlated with the ASASA strongly 
with the correlation coefficient of 0.97 as 
shown in figure 1C. The same linear correl- 
ations between m eq values and ASASA have 
been shown previously by Myers et al (10) . 
ASASA has been also related linearly to ACp 
by others (20 ' 21) . 
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Table 1. Characteristics of 45 proteins that have m eq values and crystal structures available 3 . 



Protein name 


PDB 


Number of 


Number of 


/M^(GdnHcl) 


fM^(Urea) 


ACp 


kisikjs* folded 


SASA f M a b 

<J/»».J/« unfolded 


AS AS A 


Residues 


crosslinks 


cal/(mol.M) 


Ovomucoid third domain (turkey) 


1CHO 


53 d 


3 


580 


250 


590 


3735 


7157 


3422 


lgG binding domain of protein G 


1PGB 


56 


0 


1800 


NA 


620 


3752 


7705 


3953 


BPT1 (A30, A51) 


7PTI 


58 


2 


1200 


NA 


NA 


3969 


8254 


4285 


BPT1 (V30, A51) 


1AAL 


58 


2 


1500 


NA 


NA 


3993 


8276 


4283 


SH3 domain of a-spectrin 


1SHG 


57 d 


0 


1880 


766 


813 


3925 


8220 


4295 


Chymotrypsin inhibitor 2 


2CI2 


65 d 


0 


1890 


NA 


720 


4564 


9246 


4682 


Calbindin D9K 


1IG5 


75 


0 


NA 


1140 


NA 


4774 


10373 


5599 


Ubiquitin 


1UB1 


76 


0 


NA 


1140 


NA 


4911 


10758 


5847 


HPr (B. subtilis) 


2HPR 


87 d 


0 


NA 


1050 


1160 


4751 


11688 


6937 


Barstar 


1BTA 


89 


0 


2400 


1250 


1460 


5653 


12596 


6943 


Lambda repressor (N-terminal) 


1LMB 


102 


0 


2400 


1090 


NA 


6270 


13013 


6743 


Cytochrome c (tuna) 


5CYT 


103 


1 


2800 


NA 


NA 


6087 


14382 


8295 


Cytochrome c (horse heart) 


2PCB 


104 


1 


3010 


1200 


1730 


6363 


14812 


8449 


Ribonuclease Tl 


9RNT 


104 


2 


2560 


1210 


1270 


5467 


13651 


8184 


Arc repressor 11 


1PAR 


106 


0 


3270 


1910 


1600 


6566 


15471 


8906 


FK binding protein (human) 


1FKD 


107 


0 


NA 


1460 


NA 


6144 


14798 


8654 


Iso-I-cytochrome c (yeast) 


1YCC 


108 


1 


3400 


1430 


1370 


6575 


15169 


8594 


Thioredoxin {E.coli) 


2TRX 


108 


1 


11 1 n 

J J 1U 




1660 


5847 


14776 


8929 


Barnase 


IRNB 


109 d 


0 


4400 


1940 


1650 


6050 


15093 


9043 


Ribonuclease A 


9RSA 


124 


4 


3100 


1100 


1230 


6965 


16983 


10018 


l\vJ I 


1 POP 

1 I\ W I 


1 Ifx 

1ZO 


u 


2400 


NA 


1 son 






y 1 ^+0 


Che Y ( E.coli) 


3CHY 


128 d 


0 


2260 


1600 


NA 


6673 


17646 


10973 


Lysozyme (hen egg white) 


1AKI 


129 


4 


2330 


1290 


1540 


6755 


17886 


11131 


Lysozyme (human) 


1LZ1 


130 


4 


3460 


NA 


1580 


6777 


18305 


11528 


Fatty acid binding protein (rat) 


I1FC 


131 


0 


4470 


1770 


NA 


7145 


18564 


11419 


Staphylococcal nuclease 


2SNS 


141 d 


0 


6830 


2380 


2320 


8052 


20083 


12031 


Inter leu kin 1-p 


51 IB 


151 


0 


5580 


NA 


1890 


8209 


21 188 


12979 


Apomyoglobin (horse) 


TV1\/TR 


YD J 


1 
1 


3710 


2140 


lo/U 


0Z7O 


9 1 RCK 

Z 1 07J 


1 j jyy 


Apomyoglobin (sperm whale) 


J1V1J31M 


1 

1 jj 


1 

1 


2600 


1460 


9770 
z / /u 


0 jZU 


99 1 sn 

ZZ 1 ou 


iJOOU 


Metmyoglobin (horse) 


IYMB 


153 


0 


NA 


1800 


NA 


8296 


21042 


12746 


Metmyoglobin (sperm whale) 


5MBN 


153 


0 


NA 


2040 


NA 


8320 


21327 


13007 


Ribonuclease H 


2RN2 


155 


0 


4500 


1930 


NA 


8785 


21635 


12850 


Dihydrofolate reductase ( E.coli) 


4DFR 


159 


0 


NA 


1900 


NA 


8717 


21945 


13228 


T4 lysozyme (T54, A97) 


1L63 


162 d 


0 


5500 


2000 


2570 


8553 


22913 


14360 


Gene v protein c 


1VQB 


172 d 


0 


3600 


NA 


NA 


13216 


26728 


13512 


Adenylate kinase (porcine) 


3ADK 


194 


0 


4800 


NA 


NA 


11051 


26978 


15927 


HIV-1 protease 0 


1HVR 


198 


0 


NA 


2050 


NA 


9865 


26784 


16919 


SIV protease c 


1MV 


1 oo 
1 Vo 


u 


NA 


1880 


XT A 


yyoZ 


ZOJ /o 




Trp aporepressor c 


3WRP 


202 d 


0 


NA 


2900 


NA 


11388 


28583 


17195 


a-Chymotrypsin 


4CHA 


239 d 


5 


4100 


2070 


3020 


10742 


31985 


20498 


Chymotrypsinogen A 


2CGA 


245 


5 


4440 


2030 


NA 


10742 


31985 


21243 


Tryptophan synthase, a-subunit 


IBKS 


255 d 


0 


NA 


3750 


4600 


11585 


34271 


22686 


p-Lactamase 


3BLM 


257 d 


0 


7200 


3210 


NA 


11561 


36444 


24883 


Pepsinogen 


2PSG 


370 


3 


NA 


7800 


6090 


14748 


48478 


33730 


Phosphoglycerate kinase (yeast) 


3PGK 


415 


0 


9700 


NA 


7500 


18988 


53051 


34063 



NA: Not Available; a for each protein, the PDB file code, number of residues, and number of disulfides or covalent heme-protein crosslinks is shown. SASA values were calculated by DSSP 
program as described in the text. The 5, 6 and 7th columns give experimental m eq values for GdnHCI or urea denaturation and the observed AC P , for each protein, taken from reference (10) . 
AS ASA values are in A 2 , m eq values in call{mol.M), and ACp, in call{mol.K); b SASA unMded values in this table were calculated using the extended P-strand conformation of all proteins; c Dimer; 
d These values were checked and corrected based on the number of the residues in the corresponding PDB files and hence are different from those reported in Myers et al. (10) . 



The main purpose of this study is to re- 
evaluate the effect of crosslinks on ASASA 
and also predict the m eq and ACp of unfold- 
ing based on protein sequence information. 
These latter two parameters are amongst the 
important criterion indicative of the stability 
of proteins. Therefore, prediction or any im- 



provement in the prediction of these values 
has significant theoretical and practical appli- 
cations. 



The presence of crosslinks such as disulfide 
bonds and heme groups in a protein (as shown 
in table 2) will result in a more compact un- 
folded state, thus reducing the solvent accessi- 
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Figure 1 . Dependence of A) m eq value for Gdn HC1 denaturation, B) m eq value for urea denaturation, and 
C) heat capacity changes upon unfolding on ASASA for the 45 proteins shown in table 1 



bility of the unfolded polypeptide chain. To 
compensate for the effects of crosslinks, 
Myers et al (10) employed the results of dif- 
ferent empirical methods (22) to estimate the 
magnitude of the reduction of solvent acces- 
sible surface area (ASASA) per disulfide bond. 
The reduction of ASASA per crosslink was 
estimated to be about 900A 2 . 

In the current study, to find out more about 
the effect of crosslinking through theoretical 
and computational methods, the different un- 
folded models were generated for crosslink- 
containing proteins while the crosslinks were 
preserved or removed in the unfolded states 
generated by instantaneous unfolding method 
based on assigning standard bond length and 
angle values. Then the SASA values were cal- 
culated for the generated unfolded structural 
models (Table 2). 

To quantitatively indicate the effect of 
crosslinks on ASASA upon unfolding a new 
term called Crosslinking Factor (CLF) was in- 
troduced (CLF was described in Materials and 
Methods section.) Effectively, CLF is a meas- 
ure of reduction in the SASA of unfolded pro- 
tein as a consequence of presence of a single 



crosslink, such as disulfide bond, and calcu- 
lated to be equal to 918.5 A 2 . This value is the 
average of crosslinking numbers calculated 
for 16 crosslink-containing proteins listed in 
table 2 for which the m eq and ACp values were 
available. 

In five proteins listed in the table, cross- 
links are formed via ligation of central ion 
atom of heme groups by sulfur or nitrogen 
atoms of the side chains of the interacting re- 
sidues. The average of crosslinking numbers 
for these proteins (759.0 A 2 ) is smaller than 
the average of the numbers (991.0 A 2 ) for the 
remaining proteins where the crosslinks are 
formed by disulfide bounds. However, the 
difference is not statistically significant (p- 
value >0.05). None of these values are stat- 
istically different from the calculated CLF 
value of 918.5. 

Based on the above findings, the ASASA 
values were corrected for the effect of cross- 
links on the solvent accessibility of the un- 
folded state by taking 918.5 A 2 per crosslink 
off the ASASA (called ASASA corYQCt Qd) and then 
the corrected values were re-correlated to the 
m eq and ACp values. Linear correlation coef- 
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Table 2. List of crosslink-containing proteins used in this study. Differences of SASA values for the unfolded stats in 
two different forms, i.e., with and without conserving the crosslinks, have been shown along with the number of 

crosslinks and crosslinking number for each protein 



PDB code 


SASA unMdtd without 
Crosslinks 3 


SASA unMded with 
Crosslinks 3 


ASASA 


Number of 
crosslinks (n) 


Crosslinking 
number 


1CHO 


7039 


5730 


1309 


3 


436.33 


7PTI 


8278 


6616 


1662 


2 


831.00 


1AAL 


8315 


6539 


1776 


2 


888.00 


5CYT b 


13929 


13392 


537 


1 


537.00 


2PCB b 


14410 


14118 


292 


1 


292.00 


9RNT 


13348 


11227 


2121 


2 


1060.50 


lYCC b 


15145 


14620 


525 


1 


525.00 


2TRX 


13568 


13254 


314 


1 


314.00 


9RSA 


17125 


13015 


4110 


4 


1027.50 


1AKI 


17597 


12575 


5022 


4 


1255.50 


1LZ1 


18698 


12967 


5731 


4 


1430.25 


lYMB b 


22110 


20864 


1246 


1 


1246.00 


5MBN b 


22202 


21007 


1195 


1 


1195.00 


4CHA 


26990 


25585 


1405 


5 


281.00 


2CGA 


27707 


23693 


4014 


5 


802.80 


2PSG 


45170 


37447 


7723 


3 


2574.33 


CLF (equals to the average of crosslinkin 


g numbers)±Standard Error 




918.5±145.1 



a In order to be consistent, the results presented in this table were derived from instantaneous unfolding method using standard bond 
length and angle values for both sets of data labeled "without crosslinks" and "with crosslinks" and then the SASA values were 
calculated using DSSP. b The heme containing proteins 



ficients improved to 0.90, 0.88 and 0.99 for 
GdnHCL and urea m eq as well as ACp values, 
respectively as shown in figure 2. 

The extent of increase in SASA upon un- 
folding of a protein highly depends on the 
number of residues (i.e. protein size) and the 
constraints present in the unfolded state. The 
unfolded state of a protein is populated by an 
ensemble consisting huge number of con- 
formationally distinct species. The presence 
of structural constraints limits the conform- 
ational space available to be explored by the 
protein polypeptide chain. Our analyses, in 
agreement with the results of others (10) , show 
that the amount of area buried in each protein 
correlates very strongly (R=0.99) with the 
number of residues in each protein (Eq. 9). 
The strong correlation between ASASA and 
the number of residues, makes it possible to 



estimate the thermodynamic parameters using 
equations 10 to 12. 

Where k denotes the number of residues for 
a given protein. These equations provide 
means to predict m eq and ACp directly based 
on the primary structure information. The re- 
sults of experimental studies are in close 
agreement with the results of our theoretical 
calculations which indicate the important 
thermodynamic parameters can be predicted 
using ASASA upon unfolding and taking into 
account the presence of crosslinks in the pro- 
tein. 

In a different approach to estimate SASA of 
unfolded state of proteins, we have used MD 
to simulate the unfolding behavior of proteins 
in denaturing condition, as stated in Materials 
and Methods section. Four of the proteins in 
our dataset (listed in Table 3) were subjected 



ASASA = 90.01 x k - 804.55 , R = 0.99; N = 45(Eq.9) 

m f ^ = 22.55 x k - 0.362(«xCLF) + 835.25 , R = 0.89; N = 34(Eq.lO) 

eq(CrdnHCl) 

m /TT = 17.55 x k - 0.173(> x CLF) - 517.92 , R = 0.92 , TV = 34(Eq.ll) 

eq(Urea) v ' v 1 ' 

ACp = 18.48 x k - 0.163(n x CLF) - 327.208 , R = 0.99, N = 25(Eq.l2) 
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Figure 2. Dependence of A) m eq value for GdnHCI denaturation, B) value for urea denaturation, and C) 
heat capacity changes upon unfolding on ASASA after correction for the effect of crosslinks by taking out 
918.5 A 2 per crosslink for the 45 proteins in our data set (see text for further explanation) 



to MD simulations for 10 ns at 500 °K while 
inserted in a solvation box filled by a mixture 
of water and urea molecules. As can be seen 
from the table, the maximum SASA values for 
the unfolded conformations of proteins ob- 
tained by MD are smaller than that achieved 
by non- simulation method. Consequently, the 
ASASA values are also relatively smaller. 

Figure 3 shows the snapshots of conform- 
ational changes during unfolding simulation 
of IgG binding domain of protein G (IBPG) 
which has 56 residues with no crosslink. As 
time evolves, both tertiary and secondary 
structures of IBPG are lost and at the same 
time its SASA increases. The maximum SASA 
achieved during 10 ns is 5772 A 2 which is less 
than that estimated for fully extended con- 



formation (8143 A 2 ). 

The presence of crosslinks in the unfolded 
state will result in a more compact unfolded 
form and the higher the number of crosslinks, 
the more pronounced is this effect. For ex- 
ample, as shown in figure 4, the unfolded 
conformation of lysozyme (hen egg white), a 
129-residue protein with four disulfide bonds, 
retained more globular shape at the end of 
MD simulation, although it loses the elements 
of secondary structures. 

As shown in table 3, the SASAs of the 
investigated proteins increased at the end of 
MD simulation. However, the extent of this 
increase is bigger for the protein with no 
crosslink. For example 1PGB which is a 56- 
residue protein without any crosslink showed 



Table 3. Comparison of SASA and ASASA values obtained by different methods used to unfold the 

proteins 





SASA of native 
structure 


SASA of the unfolded model 


ASASA of the unfolding 


PDB code 


Unfolding method 


Unfolding method 




MD simulation 


Instantaneous 


MD simulation 


Instantaneous 


1AKT 


6755 


10412 


12575 


3657 


5820 


lAAL b 


3993 


5579 


6539 


1586 


2546 


2TRX C 


5847 


9135 


13254 


3288 


7407 


lPGB d 


3752 


5772 


8143 


2020 


4391 



a, b, c and d are 4, 2, 1, and zero, respectively and denote the number of crosslinks 
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Figure 3. Molecular dynamics simulation of IgG binding 
domain of protein G (PDB code 1PGB) solvated in 4.4 M 
urea in water at 500 °K for 10 ns using GROMOS-96 force 
field parameters. The non-protein molecules {i.e. water and 
urea) are not shown for the sake of clarity 




0 its 0.5 ns 6 #t s 



Figure 4. Molecular dynamics simulation of lysozyme 
(hen egg white) (PDB code 1AKI) solvated in 4.4 M urea 
in water at 500 °K for 10 ns using GROMOS-96 force 
field parameters 

54% increase in SASA upon unfolding using 
MD method. However, applying the same un- 
folding condition on 1 AAL, a protein with al- 
most equal size (i.e. 58 residues) and two 
disulfide bond has led to only 40% increase in 
SASA. In all studied cases, the maximum 
SASA for unfolded conformations achieved by 
MD are smaller than that of instantaneous 
method. 

Analyses of MD trajectories showed that 
the RMSD differences for C atoms increases 
as time evolves approaching high values in 
the range of ~ 14-19 A for the studied proteins 
during the simulation. The rate of RMSD in- 
crease was dramatically fast for 1PGB and 
2TRX, with no and one crosslink, respective- 
ly. However, the rate was gradual in the case 
of 1AA1 and 1AKI with two and four cross- 
links, respectively. 

Although MD simulation under the condi- 
tion used in this study can unfold the proteins 
and also demonstrates the effect of crosslink, 
however, using this method the SASA of un- 
folded conformations never reached to the 
SASA values of the unfolded conformations 
obtained by instantaneously decomposing the 
protein native structure just by taking into 



consideration to preserve the standard bond 
lengths, bond angles and other standard chem- 
ical structure geometries. This could be due to 
insufficient simulation time or entrapment of 
protein in an ensemble of conformations in a 
local minimum of energy landscape. How- 
ever, to find out more about these issues and 
draw more sensible conclusion, further com- 
putational experiments such as hydrodynamic 
simulation are required. 

Crosslinks such as disulfide bonds and 
heme groups have profound effect on the con- 
formational flexibility and SASA values of un- 
folded state and hence influence the stability 
of the proteins by decreasing the entropy gain 
as well as reduction of ASASA upon unfold- 
ing. Studies of proteins with chemical cross- 
links have shown clearly that the major effect 
of the crosslink on the stability results from a 
decrease in the conformational entropy of the 
unfolded molecule (23 24) . 

On the other hand, attempts to increase the 
stability of proteins through introducing disul- 
fide bonds suggest that the structural re- 
straints in the native state due to the cross- 
links may also make an important contribu- 
tion to the net effect of the crosslinks on the 
stability (25) . Furthermore, inspection of the 
model structures of Micro-myoglobin (Mb) 
revealed a role for heme in stabilizing the 
folded state (26) . 

Doig and Williams (27) investigated the ef- 
fect of disulfide crosslinks on hydrophobicity 
derived stability of proteins. Based on data 
obtained from solvent transfer experiment, 
they calculated the non-polar ASASA to be 
590 and 690 A 2 per disulfide bond according 
to free energy of hydration and AC P measure- 
ments, respectively. Taking into account that 
the fraction of total area buried which is non- 
polar is about 0.70, these values correspond to 
a reduction in the total area change of 850 A 2 
and 990 A 2 per disulfide. Using solvent per- 
turbation difference spectroscopy, Pace et al 
demonstrated that the solvent accessibility of 
the aromatic residues (Tyr and Trp) in three 
studied proteins (lysozyme, RNase A and 
RNase Tl) was changed upon unfolding (22) . 
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Table 4. Thermodynamic parameters of proteins predicted based on different methods 



PDB code 




iviyers 




Predicted values using 
equations 13 to 15 b 


Experimental 0 






mGdnHCL 


mUrea 


ACp 


mGdnHCLpre 


niUreap re 


AO?,.,,- 

pre 


mGdnHCL 


mUrea 


ACp 


1CHO 


1069.3 


-94.1 


-71.9 


1117.5 


-29.4 


134.2 


580.0 


250.0 


590.0 


7PTI 


1502.6 


261.3 


366.8 


1 6 4 

1 J lU.t 




444 7 
ttt. / 


1 900 0 


NA 


NA 


1AAL 


1478.4 


261.0 


366.4 


14Q1 R 

It" 1 . o 


ZHO. J 


444 7 


i ^nn n 

1 JUU.U 


XTA 


NA 


5CYT 


2982.8 


1283.7 


1559.3 


Z5 J*+. 1 


1 1 Al Q 


1 A1& 1 
HZO.Z 


ZoUU.U 


XT A 


NA 


2PCB 


3017.1 


1315.3 


1584.6 




I 1 CO f\ 

I I jo.U 


14JU.2 


tai rv rv 

JU1U.U 


1 inn a 


1730.0 


9RNT 


2490.1 


939.1 


1168.1 


2528.5 


1007.8 


1296.3 


2560.0 


1210.0 


1270.0 


1YCC 


2813.1 


1174.8 


1448.4 


2930.0 


1218.6 


1525.8 


3400.0 


1430.0 


1370.0 


2TRX 


2906.8 


1242.6 


1507.0 


2933.1 


1224.2 


1511.7 


3310.0 


1300.0 


1660.0 


9RSA 


2476.8 


951.5 


1177.8 


2212.8 


1077.7 


1392.4 


3100.0 


1100.0 


1230.0 


1AKI 


2790.8 


1156.5 


1411.0 


2442.8 


1165.9 


1456.8 


2330.0 


1290.0 


1540.0 


1LZ1 


2874.4 


1227.0 


1488.9 


2315.6 


1 182.8 


1453.9 


3460.0 


NA 


1580.0 


1YMB 


4147.5 


2071.2 


2506.9 


3992.2 


1998.9 


2390.2 


3710.0 


2140.0 


1870.0 


5MBN 


4258.2 


2140.5 


2523.8 


4028.9 


2019.7 


2351.2 


2600.0 


1460.0 


2770.0 


4CHA 


5039.8 


2685.9 


3171.5 


4770.5 


3152.4 


3468.4 


4100.0 


2070.0 


3020.0 


2CGA 


5230.5 


2830.7 


3317.1 


4838.6 


3307.1 


3450.4 


4440.0 


2030.0 


NA 


2PSG 


8892.8 


4129.0 


6354.9 


8203.9 


3964.0 


6045.6 


NA 


7800.0 


6090.0 


Correlation Coefficient^ 


0.7293 


0.7082 


0.9751 


0.6996 


0.7519 


0.9596 








MAPE 


21.5 


31.8 


17.9 


20.5 


31.1 


16.8 








SDEP 


646.5 


1128.3 


300.9 


612.1 


1226.0 


294.8 









a: Prediction of heat capacity changes and m eq values for GdnHCL and Urea upon unfolding based on Myers' equations (10) . b: Same predictions using 
equations 13-15. c: Experimental data which are compiled from the literature and taken from reference (10) . d: Correlation coefficient between predicted 
and experimental values 



Myers et al, used these experimental data to 
estimate an approximate average value of 900 
A 2 reduction in A&4&4 un f 0 ided per disulfide, as- 
suming a universal change in accessibility 
across all residue types (10) . 

The results of these experimental methods 
have been averaged and used by Myers et al 
to compensate for the effects of crosslinks on 
AS AS A. However, it may suffer from an over 
simplification by using only the changes in 
accessibility of just two amino acids and ex- 
trapolating these changes to the total area. It 
should also be mentioned that these results 
have been concluded from very limited num- 
ber of experiments performed on only three 
globular proteins (22) . 

One of the shortcomings of using either 
CLF, introduced in this work, or experimen- 
tally derived value of 900, introduced by 
Myers et al to compensate for the effects of 



crosslinks on ASASA and hence estimation of 
m eq and ACp values is the scarcity of the data 
used. The correction value close to 900 A 2 
(proposed by Myers and here as CLF) can be 
justified by fitting equations 12 to 14 pre- 
sented in Myers et al where the disulfide bond 
corrections that maximize the fits are all close 
to 900 A 2 . However, there is no need to use a 
correction factors such as 900 A 2 proposed by 
Myers or CLF to account for the effects of 
crosslinks on m eq or ACp. Although we be- 
lieve the correction factor most likely is close 
to 900 A 2 , but it is not a magic number and 
any other value close to that can be used to do 
the correction and then draw empirical equa- 
tions to relate m eq or ACp to the corrected 
ASASA (or to the combination of number of 
amino acids and CLF as we used in here). 

The coefficients in the final mathematical 
equations will be adjusted to balance out any 



m eq(GdnHCl) = 835,25 " 325,49 X n + 22,55 X k R = 0,889; N = 34 ( Ec l- 13 ) 
m eq{Urea) = _517 - 92 " 155 - 85 x n + 17 - 55 xk R = °- 878 > N = 34 ( Ec l- 14 ) 
ACp = -327.21 - 149.49 x n + 18.48 x kR = 0.990, N = 25 (Eq.15) 



Avicenna Journal of Medical Biotechnology, Vol. 4, No. 1 January-March 2012 



Hamzeh-Mivehroud M, et al 



changes in the value of correction factor. In a 
situation where the ultimate aim is to be able 
to predict the thermodynamic parameters as 
precise as possible, one may decide to use dif- 
ferent structural descriptors to derive empir- 
ical equations for the prediction purposes. 

To this end we have furthered our investi- 
gation by trying to develop different empirical 
equations to predict m eq and ACp values. We 
have examined the effects of different struc- 
tural properties such as number of amino 
acids, number of crosslinks, size of loops re- 
presenting the total number of amino acids 
involved in the loops formed by crosslinks, 
and the central position of the regions in the 
loop area on the prediction of the thermo- 
dynamic properties. The best statistical Mul- 
tiple Linear Regression (MLR) models were 
achieved using variables representing total 
number of amino acids and the number of 
crosslinks. 

In order to test the predictive power of 
these models, the Leave One Out (LOO) cross 
validation method was used. The mean ab- 
solute percentage error of predictions (MAPE) 
of m^(GdnHcl), m^(Urea) and ACp values 
for all proteins listed in table 1 (or proteins 
with crosslinks listed in table 4) based on 
Myers' models are 19.7 (21.5), 22.9 (31.8) 
and 13.8 (17.9). The corresponding MAPEs 
using models presented in equations 13, 14 
and 15 are 22.3 (20.5), 23.6 (31.1) and 13.5 
(16.8), respectively. 

The results show that both methods are not 
statistically different in predicting the evalu- 
ated thermodynamic parameters, either for all 
data points (proteins in Table 1) or for the 
proteins with crosslinks (i.e. values indicated 
inside the brackets), and simple MLR equa- 
tions based on limited number of structural 
descriptors, i.e. number of residues and num- 
ber of crosslinks, are able to perform equally 
well. In fact equations 13 to 15 are identical 
to equations 10 to 12 and the only difference 
is the way to represent the effect of crosslink 
on the parameter of interest. For example in 
equation 13 the coefficient of variable n (i.e. 
number of crosslinks) equals to the coefficient 



of the second term on the right hand side of 
the equation 10 multiplied by the value of 
CLF. These equations are also too close to the 
equations proposed by Myers et al (eqs. 12 to 
14 in reference 10). For instance, the coef- 
ficient of variable n in equation 14 above (i.e. 
155.58) is very close to the 139.3 calculated 
by multiplying 0.14 and 995 in equation 13 in 
Myers' study (10) . 

Conclusion 

In summary, it can be concluded that the 
proposed relationships represent valuable 
tools for predicting thermodynamic para- 
meters of protein folding using the primary 
sequence information. The proposed cross- 
linking factor (CLF; which shows the effect 
of a single crosslink on ASASA upon unfold- 
ing) of 918.5 A 2 obtained based on com- 
putational simulation is very close to the pre- 
viously published experimentally derived 
value of 900 A 2 . Such a correction factor can 
be used to estimate the ASASA upon unfolding 
which in turn can be used for the prediction of 
thermodynamic parameters such as m eq and 
ACp. For the prediction of these parameters, 
one may also use number of amino acids (k) 
and number of crosslinks (n) without need to 
any kind of correction factor. 

Although the correction factor for the effect 
of crosslink on ASASA is a quantitative value 
describing a fundamental property in protein 
folding, however, for the prediction purposes, 
the use of more simple properties taken from 
the primary structure of proteins gives as well 
accurate results. In addition, the current work 
demonstrates an example where theory is cap- 
able of reproducing the results obtained from 
experimental works. 
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